The density function starts with d and shows the "likelihood of different values x given the distribution parameters. For instance dnorm(1.85, 1.7, 0.1) gives the likelihood of the value 1.85 given a normal distribution with a mean of 1.7 and a standard deviation of 0.5.
curve(dnorm(x,1.7,.1), # function y = f(x) for value on y axis
1.3, # minimum x
2.1, # maximum x
ylab = "density")
The random generation function (cdf) starts with r and and generates random values x given the distribution parameters.
## [1] FALSE
For instance rnorm(500, 1.7, 0.1) will produce 500 random values that have a mean of 1.7 and a standard deviation of 1. We can call these 500 values samples from the distribution.
samples = rnorm(500, 1.7, 0.1)
plot(samples, ylab = "value", xlab = "sample #")
If we have samples and want to see how they are distributed, we can use a histogram or a density plot. It is better to use histograms than density plots, because the latter involve estimation of the smoothness of the density, which can lead to artefacts.
par(mfrow = c(1,2))
hist(samples, main = "Histogram")
plot(density(samples), main = "Density plot")
It is possible to observe multiple variables as outcome of a process. For instance, we might record the length and the weight of a newborn baby. To show how these variables are related, we plot them together.
dt = data.frame(length = rnorm(1000,50,5))
expected_weight = 3.5 + scale(dt$length)*.5
dt$weight = rnorm(1000,expected_weight,.5)
par(mfrow = c(1,2))
with(dt,plot(length,weight, pch = 16, cex = .5, main = "Scatter plot"))
with(dt,smoothScatter(length,weight, main = "Smooth scatter plot"))
Another type of process that also leads to jointly distributed variables is Bayesian data analysis. If an analysis estimates several parameters, we can look at posterior distribution of these parameters individually– we look at their marginal distributions–or we can look at the joint posterior distribution of the parameters. Even if I am just referring to posterior distributions of parameters here, we typically look at samples from this posterior distribution.
The typical way to describe such variables that are jointly and normally distributed is to use a multivariate normal distribution. A multivariate normal distribution has parameters you already know, and one additional parameter:
Here are these parameters for our birth weight and birth length data:
colMeans(dt)
## length weight
## 49.999539 3.506031
cov(dt)
## length weight
## length 27.999168 2.7392792
## weight 2.739279 0.5135685
The variances are on the main diagonal of the covariance matrix, and the co-variances on the off diagonal.
As you probably have guessed, covariance and correlation are linked:
correlation.R =
cor(dt)[2]
correlation.manual =
cov(dt)[2]/
prod(sqrt(diag(cov(dt))))
cbind(correlation.R,
correlation.manual)
## correlation.R correlation.manual
## [1,] 0.722378 0.722378
Lets assume the monthly growth rate follows following distribution:
What is the distribution of heights of 10000 children at different ages?
This is an example the displays the central limit theorem, which states that the result of processes that manifest as the sum of many small identical and independently distributed events are normally distributed.
One way to explain why this is the case is to see that there are more possible combinations of events that lead to average outcomes than possible combination of events that lead to extreme events.
For instance, assume that you are throwing a fair coin four times, and each time heads shows you receive one credit point and each time tail shows you loos a credit point. The next table shows that there are more possible sequences that lead to an end result of 0 credit points than sequences that lead to 4 or more credit points.
| Permutation | event 1 | event 2 | event 3 | event 4 | sum |
|---|---|---|---|---|---|
| 1 | -1 | -1 | -1 | -1 | -4 |
| 2 | 1 | -1 | -1 | -1 | -2 |
| 3 | -1 | 1 | -1 | -1 | -2 |
| 4 | 1 | 1 | -1 | -1 | 0 |
| 5 | -1 | -1 | 1 | -1 | -2 |
| 6 | 1 | -1 | 1 | -1 | 0 |
| 7 | -1 | 1 | 1 | -1 | 0 |
| 8 | 1 | 1 | 1 | -1 | 2 |
| 9 | -1 | -1 | -1 | 1 | -2 |
| 10 | 1 | -1 | -1 | 1 | 0 |
| 11 | -1 | 1 | -1 | 1 | 0 |
| 12 | 1 | 1 | -1 | 1 | 2 |
| 13 | -1 | -1 | 1 | 1 | 0 |
| 14 | 1 | -1 | 1 | 1 | 2 |
| 15 | -1 | 1 | 1 | 1 | 2 |
| 16 | 1 | 1 | 1 | 1 | 4 |
Now lets do the same experiment again, except that we are not looking at 4, but 16 tosses, which leads to \(2^{16}\) or 6.5536^{4} possible sequences. Here is the distribution of credit points.
One popular device to display such a process is a Galton1 board:
Galton Board
What is the association between length and weight at birth?
We simulate some data:
dt = data.frame(length = rnorm(250,50,5))
expected_weight = 3.5 + scale(dt$length)*.5
dt$weight = rnorm(250,expected_weight,.5)
When data covary,we look at e.g. a scatter plot, which shows the joint distribution, to see how the data are related.
Sometimes, we want information about only one dimension of the data. This information is shown in the marginal distribution.
One way to view how the marginal distribution is calculate is to imagine that data point (or samples from a p posterior) are collapsed over one variable. Like this:
Galton Board
This is only a visualization to give you an intuition. We won’t cover here how one calculates marginal integrals. When we dealing with samples from posterior distributions, we also do not calculate integrals. In fact, just showing the histogram for one parameter of the posterior is already the display of this variables marginal distribution.
Here is a more traditional way to show two marginal distributions.
In this plot, each histogram shows the marginal distribution of length and weight, respectively. E.g. to get the frequency of length = 50cm, we sum all individuals with the eight, regardless of their weight.